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Inferring Examinee Ability When Some Item Responses Are Missing 



Abstract 

The basic equations of item response theory (IRT) provide a 
foundation for inferring examinees' abilities and items' 
operating characteristics from observed responses. In practice, 
though, examinees will usually not have provided a response to 
every available item- -for reasons that may or may not have been 
intended by the test administrator, and that may or may not be 
related to examinee ability. The mechanisms that produce 
missingness must be taken into account if correct inferences are 
to be drawn. Using concepts introduced by Rubin (1976), we 
discuss the implications for ability and item parameter 
estimation that are entailed by alternate test forms, targeted 
testing, adaptive testing, time limits, and omitted responses. 

Key words: Adaptive testing; Item response theory; Missing 

data; Omitted responses; Targeted testing 
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Introduction 



The capability to measure different examinees with different 
test items is an oft-cited advantage of item response theory 
(IRT). This option implies a problem of inference in the presence 
of missing data, since an examinee may not have provided a 
response to every item in the complete item set. Five types of 
missingness are in fact encountered regularly in routine 
applications of IRT: 

Case 1: Alternate test forms. Two or more tests with similar 
content but different items are often employed to minimize carry- 
over effects (as in test-retest designs), reduce fatigue and 
practice effects (by splitting a test into shorter subtests), or 
avoid cheating behavior. A examinee is typically administered 
one form selected at random. 

Case '2: Targeted testing. Two or more tests with similar 
content, but pitched at different levels of difficulty, can be 
used to make testing more efficient when background information 
(such as grade or courses taken) is available for deciding which 
test to administer to each examinee. 

Case 3: Adaptive testing. Testing can also be made more 
efficient and less time-consuming if each item presented to an 
examinee is selected on the basis of his responses up to that 
point, and possibly background Information as well. 
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Case 4: Not-reached items. Under typical testing conditions, 
some examinees will not reach the last few items on a test 
because of the time limit. 

Case 5: Omitted items. Even when an item has been presented to 
an examinee and he has time to reach it, he will sometimes choose 
not to respond. 

When incomplete data of any of these types are encountered, 
the IRT model that presumably accounts for the responses that are 
observed, is embedded in a more encompassing model that determines 
which responses will be observed and which will be missing. This 
paper discusses the implications that missing responses hold for 
likelihood and Bayesian inferences about examinee ability 
parameters and item parameters, assuming an IRT model holds. When 
can the process that causes missingness be ignored? When it 
cannot be ignored, how can it be modeled? How can conventional 
IRT methods for missing responses be evaluated in this framework? 

The following section extends IRT notatiop to handle 
missingness, using concepts and notation from Little and Rubin 
(1987) and Rubin (1976). Next, Rubin's (1976) conditions for 
when the missingness process can be ignored are reviewed. Each 
of the five types of missingness listed above are then discussed 
in some detail in the problem of inferring ability when item 
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parameters are known. This is followed by the extension to item 
parameter estimation. A final section summarizes our results. 



Background and Notation 

At the heart of IRT is the model for the response to item j , 
with its possibly vector -valued parameter , from an examinee 
with ability 6, The Rasch model for dichotomous items, for 
example, posits 



P(U. = u.|^,b.) = exp(u.(^ - b.)]/[l + exp(^ - b.)] , 

J J J J J J 

where u.===l denotes a correct response and u.==0 an incorrect one, 

J J 

and bj is the difficulty parameter of item j . We assume IRT 

functions that are twice differentiable, and interpret 

P(Uj « as the proportion of correct responses we would 

expect to many items with P ^ from many examinees with that 

value of 6 . 

Under the usual assumption of local independence, the 
conditional probability of the response vector U « (U^,...,U^) 
for n items is obtained by the product rule: 

n 

p(u = u\d,p) = n P(u = u ) . 

j=i J J J 



( 1 ) 



It* is further assumed that if y denotes background information 
about an examinee such as age, GPA, or courses taken, then 

P(U = ul^,;8,y) = P(U = u|^,;8) . 

When there is no possibility of missing responses, (1) can 
be interpreted as a likelihood function, say L(^|u), once a 
particular value u of U has been observed. Direct likelihood 
inferences are based solely on relative values of L at different 
values of ^ . It might be said, for example, that the probability 
of u is twice as high at than at The maximum likelihood 

A ~ 

estimate (MLE) , d, is the value at which u has the highest 
probability. Note that in direct likelihood inference, the MLE 
concerns only the data that were actually observed. 

The role of the MLE in sampling distribution inferences 
concerns its distribution under repeated sampling of observations 
with a fixed "true" parameter value. If n is large, the sampling 

A 

distribution of B as computed from repeated observations of U can 
be approximated by a normal distribution with mean B and variance 

-1 



where ^(^|u) = log L(^|u). By considering the distribution of B 
over hypothetical draws from the sample space, sampling 
distribution inferences involve datasets that could have been 
observed, but were not. 
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d Jl(B\u) 
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Bayesian inferences are based on the posterior distribution 



for $ given u, or 



p(^lu) = K L(^|u) p(^) , (2) 

where K is a normalizing constant and p(^) conveys knowledge 

about $ before a value of U is observed. The posterior mean and 

mode of $ are sometimes taken as point estimates in IRT . The 

2 

posterior variance is approximated by a when n is large. (This 
is the variance of the posterior distribution for induced by the 
data actually observed, in contrast to the variance of an 
estimator over hypothetical repeated observations) . 

In many applications of IRT, an examinee provides responses 
to only a subset of the n items to which responses could have 
been observed. The data thus consist of ^i) the identification 
of the subset of items to v/\iich responses are observed and (ii) 
the responses to those items. The first inferential problem we 
address is to estimate an individual examinee ' s $ from this 
extended observation , assuming that both the IRT model and the 
item parameters are known. To this end, we adapt notation from 
Little and Rubin (1987) and Rubin (1976) in defining the 
following terms: 

U = (U^,...,U ) is the (hypothetical) random vector of 
responses to all items in the full item set . 



o 



o 



M « (M. , . . . ) is an associated '*missing-data indicator,*' 

I n ° 

with each element taking values of 0 or 1. If *= 1, the 

value of U. will be observed; if m. *== 0, the value of U. will 

J J J 

be missing. 



o V = conveys the data that are actually observed: 

V. - U. if m. - 1 but V. = * if m. - 0. 

J J J J J 



An observed value of M, say ra, effects a partition of U, u, 

V, and V according to which elements are observed and which are 

missing. That is, we may write U ^^mis’^obs^ distinguish the 

missing and observed elements of U, respectively. Similarly, u = 

(u . ,u , V = (V . ,V , ), and v - (v . ,v ) . As with u and 

^ mis' obs'^ ' ^ mis’ obs^ ’ mis' obs" 

m, let V denote a realized value of V. 



Example 

An examinee is administered a two- item test. With each item 

2 

scored right or wrong (1 or 0) , there are 2 *==4 possible 
patterns for U: (0,0), (0,1), (1,0), and (1,1). The second 
response may be missing, however. With 1 representing "observed" 
and 0 representing "missing," there are 4 conceivable patterns 
for M, of which (1.0) and (1,1) can be realized. If the examinee 
would have responded incorrectly to the first item and correctly 
to the second, but the response for the second item is missing, 
then u >= (0,1), m == (1,0), and v ^ (0,*). ^ 

i 

A. 

6 












Inferences must of course be based on the data that are 
actually observed, namely realizations of V = ^^obs'^^’ 
the hypothetical complete data vector (U,M)--even if there is no 
intention of observing a response to every item- -is a convenient 
way to begin. It forces us to explicate our beliefs about the 
relationships among ability, item response, and missingness- - 
exactly what is required for building a sensible model for V. 
Recalling that p(u,m) can be written as p(m|u) p(u) or as p(u|m) 
p(m), define the following densities: 



f (u) is the density for all n responses. In this paper, 

u 

f.(u) takes the form shown in (1), so by local independence 

u 

we can write f,(u) = f,(u') f.(u") for any ordering and 
u u u 

partitioning of the items into (u' ,u' including 

(u ,u ) . 
mis obs 



g^(m|u) is the probability that M takes the value m = 

(m., , . . . ,m ) given that U takes the value u = (u., , . . . ,u ) , 
with <i> being the (possibly vector-valued) parameter of the 
missingness process. It is possible for ^ to be a component 
of in which case the value of 9 itself plays a role in 
determining whether a response will be observed. In these 
cases we shall sometimes write g(m|u,^,<^) to emphasize the 



dependence on 9 . 



o h^(u|m) is the probability that U takes the value u given 
that M takes the value m, 

o probability that M takes the value m. Again, $ 

may be a component of <)> . 



Example (continued) 

Suppose that the missingness process in the two- item example 
initiated above can be described as follows: The second response 

is observed whenever the first response is correct; the second 
response will be observed with probability 4> if the first response 
is incorrect. Then 



g^(m|u) = ■ 



1 if m==(l,0) and u==(l,0) or (1,1) 

l-<f> if m=(l,0) and u==(0,0) or (0,1) 

<l> if m==(l,l) and u=^(0,0) or (0,1) 

0 otherwise. # 



Whenever not all potential responses may be observed for any 
reason- -even if they all do turn out to be observed- - the data are 
V. To obtain the likelihood function, we start with the 
likelihood for the (hypothetical) complete data (U,M), then 
average over the missing responses 
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where 8 takes the value 1 if a value (.6,4) is in the parameter 
space Q-, and 0 if it isn't. This observed-data likelihood is a 
weighted average over all complete -data likelihoods that have the 
targeted responses to the observed items. The weights are 
proportional to the probabilities of these potential response 
patterns for the different values given m and • Using 

local independence, we can bring the probability for the observed 
responses outside the integral: 

U»,*|v) - SC. ) 

Equivalently, using the alternative expression for p(u,m), 



U^.^lv) = 6C,-) t^(m) du^.^ . (4) 



mis’ obs* 



Appropriate likelihood inferences are based on relative values of 
L(^,<^|v) at various values of or at various values of 6 

after eliminating <i> by conditioning or maximizing. Appropriate 
Bayesian inferences are based on the posterior distribution 



p(^,(^lv) a L(^,(^|v) p(6,4>) . 



(5) 



where p(^,^) conveys prior knowledge about 9 and 4- Appropriate 
sampling distribution maximum likelihood inferences concern the 
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distribution of ($ ,(f>) from (3) or (4), over hypothetical repeated 

observations of V for fixed ($,<{>), 

In general, then, the correct likelihood function involves a 

nuisance parameter and depends not on just the responses that 

were observed, through responses that were 

not observed, throueh f.(u . ) and e.Cmlu . .u, ). 

^ $ mis ^<f> * mis obs 

Example ( cont inued) 

With IRT for binary variables, the integral over that 

appears in (3) is a sununation over all possible response patterns 
with '^obs* two- item example with the first response 

incorrect and the second response missing, the potential complete 
patterns u are (0,1) and (0,0), Thus, 

L(^,.^1V=(0.*)) = f/U^=0) 

X (f^(U 2 = 0 ) g^[M=(l,0)lu=(0,0)] 

+ f^(U2=l) g^[M=(l,0)|U=(0,l)l) . (6) # 



Conditions for Ignorability 

I gnoring the missingness process when drawing inferences 
about $ means that rather than using the correct likelihood 
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using a facsimile of (1) as applied to alone: 






6 6 obs' 



5(5,0^) n P(U - uAe.p ) . 
obs J J J 



( 7 ) 



In particular, direct likelihood inferences about 6 that ignore 

ic 

the missingness process simply compare values of L at various 
values of 6, Bayesian inferences that ignore the missingness 
process start with an analogue of (2), a psuedo-posterior ^ 
distribution proportional to 



L p<«) . 



( 8 ) 



Sampling-distribution maximum likelihood inferences that 

A 

ignore the missingness process consider the distribution of 6 from 
(7) over repeated samples of responses to the items for which 
m^=l. This involves a different reference sample space- -not the 
sample space of v values, driven by (6 ,<i>) , but a sample space of 
u^bs values for a fixed m, driven by $. (This reasoning is used 
in survey sampling when the exact size of the sample is not known 
before it is obtained. Even though the sample size N is a random 
variable with its own distribution and parameters, standard errors 

A 

for 6 are typically computed with respect to repeated draws with 

the observed sample size N, rather than with respect to repeated 

draws of (U ^ ,N).) 
obs 
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It is a pleasant state of affairs when ignoring the 
missingness process leads to the correct inferences about since 
(7) and (8) don't require the specification of g, h, or t, and 
standard computing algorithms can be used. Depending on why the 
missing responses were missing, hoWevet, these procedures need not 
lead to the correct inferences. Rubin (1976, 1987) specifies 
conditions under which a missingness process can be ignored under 
sampling distribution, direct likelihood, and Bayesian inference. 
They involve the concepts missing at random, missing completely at 
random, and distinctness of parameters: 

Definition 1 : Missing responses are missing at random (MAR) if 

for each value of <i> and for all fixed values m and , 

e,(mlu . ,u , ) takes the same value for all u . . (This 

^(f> * mis obs mis 

definition of MAR applies to the missingness process in general, 
as in Rubin, 1987, rather than a specific value of the missingness 
variable, as in Rubin, 1976.) 

Definition 2 : Missing responses are missing completely at random 

(MCAR) if for each value of 4> and for each fixed value m, g^(m|u) 
takes the same value for all u. 

Definition 3 : The parameter 0 is distinct (D) from <f> if their 

joint parameter space factors into a ^-space and a <^-space, and 
when prior distributions are specified for 0 and they are 
independent. 
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Taken together, MCAR and D imply that the values of both the 
observed and the missing responses are independent of the pattern 
of missingness. MAR and D together imply that the values of the 
missing responses are independent of the pattern of missingness, 
conditional on the values of the observed responses. MCAR implies 
MAR. 



Examp 1 e ( c on t inue d ) 

In order to satisfy MCAR, it must be that for each value of <f> 

and any value of m, gj(mlu) takes the same value for all u. In 

9 

our two- item example, however, g^(M«(l , 0) |U“(1 , 1) )=1 while 

g . (M=(l , 0) |U"=(0 , 0) )=1 . Except in the trivial case that Q. = {0}, 

9 9 

MCAR is not satisfied. 

In order to satisfy MAR, it must be that for each value of 9 

and any fixed values of m and u , , gj(mlu . ,u , ) takes the same 

obs ^9 * mis obs 

value for all values of u . . This condition is satisfied 



mis 

trivially whenever m=(l,l), since there are no missing 
observations. It is also satisfied trivially in our example when 
m“(0,l) or m-(0,0), since these missingness patterns have 
probability zero for all u. The following equalities for m*(l,0) 
complete the verification of MAR: 



g^(M==(l.,0)lu-=(l,0)) - g^(M=(l,0)|u=(l,l)) = 1 



g^(M=(l,0)lU=--(0,0)) = g^(M=(l,0)|u=(0,l)) = ■ 



# 



We are now in a position to summarize Rubin's conclusions 
regarding direct likelihood and Bayesian inference. First, a 
more easily verified sufficient condition: 



o When making direct- likelihood or Bayesian inferences about 
it is appropriate to ignore the process that causes 
missing data if missing data are missing at random and the 
parameter of the missing data process is "distinct" from B 
(Rubin, 1976, p. 581) 



When MAR is satisfied, g does not depend on u . and can be 

^ ^ mis 

brought out of the integral in (4), which then simply integrates 



to one. If D is satisfied as well, the only dependence of 



L(5,^|v) on B is through L 



f/“obs> 



Under weaker conditions for ignorability , the integral need 
not drop out as it does under MAR, but its value does not depend 
on B, Necessary and sufficient conditions are as follows: 



o Suppose L (^|u, ) >0 for 

for B ignoring the process 

correct for all 4> e Q., if 

(b) for each 4> € Q., 

4 > 



all B € All likelihood ratios 

that causes missing data are 
and only if (a) Q. . = x Q and 

uO u <p 



E (g^Cmu . ,u, ) m,u 

u . '■^6 ‘ mis obs' * obs' 

mis 



(9) 



takes the same positive value for all B. (From Rubin, 1976, 
Theorem 7.2.) 



o 



The posterior distribution of $ ignoring the process that 
causes missing data equals the correct posterior distribution 
of S if and only if 




u . ,u , 
mis obs 




or 




u . ,u , ) p(u . 

mis obs ^ mis 



1(?) p(^l(?) d^du 



mis ’ 



takes a constant positive value. (From Rubin, 1976, 

Theorem 8.2.) 

Example (continued) 

Equation 6 gives the complete-data likelihood for the 
observed data v = (0,^), namely L($,<^|V (0,*)). When does the 

psuedo- likelihood L (^|U^«0) yield the same direct likelihood 
inferences about 61 For this to happen, it must first hold that 
the 6 and <f> sample spaces are distinct; it cannot be, for 
instance, that the observed pattern of missingness could occur for 
some values of $ but not for others. Second, the following term 
that appears in (6) must be constant for all values of 6: 
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MAR would mean that g . [M=(1,0) lU^u] is constant for all u, in 

4 > 

which case the expression simplifies to 

[f^(U2=0) + g^[M=(1.0)] ; 

then, since the sum in brackets is one, simply to the constant 

value g,[M=(l,0)]. When this happens, the sufficient condition 
<p 

for ignorability is satisfied. But even if g^[M^(l,0) |U=u] is not 
constant over u, the entire expression can be constant for B if 
the variations in f and g over B cancel each other out. For 
example, it could be that ^ ^ B and, for U 2 * 0,1, 



g^[M=(l,0)|U=(0,U2)] - [f^(U2=U2>] 



As we shall see in the case of intentional omissions, such 
constraints are not generally plausible in the context of IRT. # 

When ignorability under direct likelihood inference holds for 

a given missingness process- -as occurs when MAR is satisfied- -the 

correct value of B is identified as the MLE. The usual sampling- 

2 

distribution interpretation of B and o may or may not be 
justified. (Recall that if the sampling interpretation is to be 
justified at all, it will be with respect to repeated response 

- A 

sampling with m fixed at m. The variance of B, for example, may 
be quite different in this frame of reference from its variance 
under repeated samples of 
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First, a sufficient condition: 



o When making sampling-distribution inferences about statistics 
T(v) , it is appropriate to ignore the process that causes 
missing data if missing data are missing completely at random 
and the parameter of the missing data process is "distinct" 
from 9. (Little and Rubin, 1987, p. 14) 



Under these conditions, p(U=u,M=m| ^ , <^) f^(u) g,(m) with 6 and <f> 

u <P 

distinct, and v may be thought of as the outcome of a two-stage 
experiment: <f> determines m in the first stage and 9 determines 
Uobs second. An experimenter looking at the results of the 

second stage has the same information about 9 as an experimenter 
who has performed that latter experiment only with the value m 
predetermined. 

A necessary and sufficient condition for ignorability for 
sampling inferences about a generic statistic T(v), based on 
Rubin's (1976) Theorem 6.2, is as follows: 



The sampling distribution of T(v) under f. calculated by 
ignoring the process that causes missing data equals the 
correct conditional sampling distribution of T(v) given 
m under under 
value of m, 



f- and g. if and only if for each fixed 
9 <p 
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o* 



m 



"" E^[g (m|u)|m,fl,.^] > 0 . 
mis ^ 

Equivalently, the probability of each missingness pattern must not 

depend on the values of the responses that are observed; for each 

fixed m and any u' and u'/ , it must be true that 

obs obs 



Pr(M=m|u 



obs ^obs 



,Q ,4>) = Pr(M=m|u 



obs 






This condition is implied by Rubin's (1976) slightly stronger 

’’observed at random." Unless it holds, (u , ,m) does not admit to 

obs 

a decomposition into a sequence of independent experiments because 
the value of plays a role in determining M, ana the 

conditional frame of reference is not appropriate. 

Inferences about Examinee Ability 
The following sections address in turn the common types of 
missingness in IRT that were listed in the introduction, in the 
problem of drawing inferences about 6 when ^ is known. In each 
case, we consider whether the conditions for ignoring missingness 
are plausible, and, when they are not, discuss how the 
missingness process might be modeled so that inferences can be 
drawn . 




Case 1: Alternate test forms 

By "alternate test forms," we mean tests whose items all fit 
the same IRT model, and which provide information sufficiently 
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similar that the test administrator is indifferent as to which 
form any particular examinee is presented. The form an examinee 
receives will depend a random process such as a coin flip or a 
form-spiralling scheme. The common practice in IRT applications 
with alternate test forms is to base inferences about Q on 




The use of K alternate test forms implies that only K 
missingness patterns, m^, . . . , 
elements of m^ are zero except those that correspond to the items 
that appear in Form k. Denote their respective probabilities by 
<f>^ = P(M=m^) . Assuming the IRT model means that f^(U) is as given 
in (1); that is, we assume that item responses would be governed 
by 9 alone, regardless of which items would be presented. Even 
though the items of only one form will actually be presented, it 
is possible to express our assumptions about the connection 
between the (hypothetical) values of complete response pattern and 
the probability of the missingness pattern as follows: 






,m , can occur, wnere aii tne 
K. 



g^(m|u) 




for all u if m=iiij^; i.e.. Form k 
otherwise . 



Since the values of g do not depend on u, MCAR, and therefore 
MAR, are satisfied. Verifying D for likelihood inference requires 
that all values of 9 are possible with all possible values of 
they are. Verifying D for Bayesian inference requires that prior 
beliefs about 9 and 4> be independent; this is eminently reasonable 



as well. Having satisfied the sufficient conditions MCAR and D, 
we conclude that the missingness caused by the random 
administration of alternate test forms is ignorable under direct 
likelihood and Bayesian inference, and under sampling distribution 

A 

interpretations of 9 . Common practice is therefore justified. 

Case 2: Targeted Testing 

Targeted testing also involves multiple test forms, but ones 
in which the distributions of item difficulty differ from form to 
form. Exploiting the fact that estimates of 9 are more precise 
when an examinee is administered items with difficulties near 9 , 
targeted testing uses background information y about an examinee 
to select a test form that will probably be more informative about 
him than other forms. For example, an easy form and a hard form 
might be constructed from a set of n items calibrated together 

under the same IRT model, then the easy form could be given to 

first graders and the hard form to second graders. 

As in Case 1, the existence of K forms implies that only K 

patterns of M, namely m, , . . . , m , can be realized. The 

J. K. 

parameter of the missingness process has values » which 

indicate the probability that an examinee with background 
variable y will be administered Form k. For at least one k and 

two values y' and y' ' , ^ ' ) I happens when p(^|y') 

^ P(^|y'')i the difficulty of Form k is better suited to the 

typical examinee with one value of y than the other. If we denote 
the easy and hard ^orms in the two- form example mentioned above as 
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1 and 2, and let y denote grade level (1 or 2), then 
g^(M=nij^lu-u,y) = for all u, with 



= P(M=iY|y) = 



1 if y * 1 and k “ 1 

0 if y =“ 1 and k *= 2 

0 if y * 2 and k = 1 

1 if y = 2 and k = 2 



Because g does not depend on u, MCAR is satisfied. Assuming 

that all values of d are possible at all values of y--even if 

they are more likely for some values of y than others -- 

distinctness as required for direct likelihood inference is also 

satisfied. The values of maximum likelihood estimates of d from 

L (^lu , ) are therefore the correct values under targeted 

' obs 

testing. This is all that matters for direct likelihood 
interpretation of the MLE. Sampling-distribution interpretations 
are also appropriate, with respect to repeated administrations of 
the form that actually was administered. 

Distinctness as required for Bayesian inference is not 
satisfied. Prior beliefs about 9 and <i> are associated through y, 
so p(^,i/>) ^ p(^) p(<^) • Intuitively, knowing which form an 
examinee was administered under targeted testing is a source of 
information about 9 since form selection depends on prior 
knowledge about 9 through y. This knowledge must be taken into 
account in Bayesian inference. It is true, however, that p(^><^ly) 

Bayes -distinctness is satisfied conditional on 



= p(^ly) p(«^ly) • 
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X, and the missingness process can be ignored conditional on y . 
Thus, the correct Bayesian inferences under targeted testing are 

'k ^ 'k " 

obtained with L P(^|y). but generally not with L 



Case 3: Adaptive Testing 

As mentioned for Case 2, IRT measurement can be made more 
efficient by presenting an examinee with items that are 
informative in the neighborhood of his 6 , Adaptive testing uses 
information from an examinee's preceding responses, and possibly 
from from his background variables y as well, to select each next 
item to administer. As responses accumulate, more is known about 
6 and successive item selections are more accurately targeted. 

The datum observed in adaptive testing is a sequence of nobs 

(<t.) ordered pairs. S - (di ) ^nobs •”obs(„obs) > > ' 

where I, identifies the k'th item administered and U , . is the 

k obs(k) 

response to that item. Define the partial response sequence as 
the first k ordered pairs in S, with the null sequence s^ 
representing the status as the test begins. Augment the set of n 
items with the fictitious Item 0, the selection of which 
corresponds to a decision to terminate testing. It can be 
written as the nobs+l'st item in the test, although no response 
is associated with it. 

A test administrator defines an adaptive test design by 
specifying for all items j , all realizable partial response 
sequences s^^, and all valu 's of y, the probabilities 0(j,Sj^,y) 



Oil 
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that item j will be selected as the k+l'st test item, after 
observing the partial response sequence s^ from an examinee with 
Y=y. The dependence of item selection probabilities upon y allows 
for a hard item to be the first one presented to a high school 
graduate, say, but an easy one to be first for a nongraduate. 

Item selection probabilities in designs that do not use y can be 
writucn simply as (^(j,s^). We begin by considering designs of 
this latter type. 

One example of an adaptive testing design is Bayesian 
minimum variance item selection (Owen, 1975). In its’ pure form, 
the item that minimizes the expected posterior variance of 6, 
using the current posterior distribution p(^ls^), is chosen as the 
k+1^^ item with probability one. To reduce the exposure of more 
informative items, positive probabilities may instead be assigned 
to several fairly informative items. Typically, testing continues 
until either a desired level of precision is reached, or a 
predetermined number of items has been administered. 

A second example of an adaptive testing design is the two- 
item example employed earlier in this paper. Its definition of g 
corresponds to administering the first item to all examinees, and 
with probability <f> , the second item to some of the examinees who 
answered the first item incorrectly. 

In adaptive testing, the probability of observing s from an 
examinee with ability $ can be built up sequentially. The 
probability of selection for the first item is , The 

probability of response given by the IRT 
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model as ( 1) ^ conditional independence, a value that 

does not depend on the fact that this item happened to have been 
presented first. The conditional probability of selection for the 
second item given s ^ is <^(i 2 ,s^)--a value independent of The 
probability of the corresponding response is ^'-^obs(2) ^ ^ value 

independent of the identification of, and the response to, the 
first item. Continuing in this manner until it is determined to 
stop testing, with probability <^(0,s), we obtain 



nobs+1 



nobs 



P(s|») . f/“obs(k)> ■ 



k=l 



k=l 



The likelihood function induced by the observation of s is thus 



nobs+1 - - ^ 

L(0ls) = n ^(i^-^-l) ^ ^^l^'obs^ ■ 

k=l 



( 11 ) 



Observe that . . . 



1. s conveys the value ofm: m. = 1 if i, for some k, 

J ^ 

1< k < nobs; otherwise, m^ =0. 



2. s conveys the value of » namely the responses to the 

items administered during the course of the test. 



O 
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3 . 



L factors into two products, the first of which depends on 
<t>, second of which- -namely L 

--depends on $ and u . 



dbs * 



4. If s' and s'' imply the same m and , then L(^|s') cc 



L((?ls"). 



X I 

Points 3 and 4 justify the use of L (^| 



u , ) for direct 

obs 



likelihood inference. It may be instructive nonetheless to verify 

the satisfaction of MAR. Now P(M-=m,U**u) , or the probability of 

the hypothetical complete observation (m,u), is the probability of 

observing a response sequence s that yields the targeted m and 

u , , times the probability that the unobserved responses u . 

obs mis 

also take the targeted values. Defining T * {s: M=m n ' 

as the set of response sequences that present the targeted items 
and have the targeted responses to them, we have 




nobs+1 



{ S n ^(i s ) n f^(u 
T k obs 



obs(k) 



){ n f^(u 



r mis(k) 



) ) 



mis 



nobs+1 

( 2 n ^(i 

T k 




n 
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nobs+1 

= { S n ^(i, ,S )) P(U-u) . 
T . k K K-i 



But then 



g^(mlu) = P(M=m,U=u)/P(U=u) 



nobs+1 

= S n <f>{i ,s ,) , 
T k K K-i 



a value that does not depend on required for MAR, This 

argument also holds when <f> depends on y. (QED) 

MAR and distinct parameter spaces are sufficient for 
ignorability of the adaptive- testing missingness mechanism under 
direct likelihood inference. Ignorability holds under Bayesian 
inference if, in addition, the prior distributions for $ and are 
independent. As with targeted testing, this latter condition 
fails if for some y' and y' ' for which p(^|y') ^ p(^|y'')i there 
exist j and s such that <^(j,s,y') ^ <^(j,s»y'')- When this is so, 
Bayesian inference demands the use of p(^|y) rather than p(^) in 

'k I ) 

conjunction with L 

Even though ignorability under direct likelihood inference 
k 

means that L yields the correct maximizing value from a given 
observation, sampling-distribution interpretation of the MLE 0 is 
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not justified in general. To see this, recall that the necessary 



condition for sampling-distribution ignorability requires that for 
each fixed m and any ^nd 



Pr(M=mlUobs=u;bs'^'<^) = ^ 



This would require that the probability of any given missingness 
pattern be the same no matter what values the responses took. But 
since by definition adaptive tests produce missingness patterns as 
a function of the response values that are observed, only a 
degenerate adaptive testing scheme could satisfy this condition. 

Concluding that the item selection mechanism is not ignorable 
for sampling distribution inference means that the correct 

A 

sampling distribution for d must be calculated with respect to 
repeated administrations of the entire adaptive test. While 
general theory does not relate its variance in this frame of 

'fc 

reference to the second derivative of L , the latter may be a 
reasonable approximation of the former under particular adaptive 
test designs. Whether this is so must be determined individually 
for each adaptive test design, analytically in simple cases but by 
simulation in more realistic cases. 

Case 4: Not-Reached Items 

IRT is intended for “power" tests, or those in which an 
examinee's chances of responding correctly would not differ 
appreciably if the time limit were more generous. Time limits are 

O 
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typically chosen to allow most examinees to respond to all items, 
but a few examinees won't have time to answer all of them. This 
section concerns the items that an examinee does not reach. It 
assumes the examinee has not interacted with the item--e.g., he 
has not seen what the items at the end of the test ask, and 
decided to work instead on the ones he has seen at the beginning 
of the test. 

It is common practice to identify not-reached items by 
working from the end of an examinee's response string toward the 
beginning, taking unanswered items as not-reached until an answer 
is encountered. Unanswered items preceding this last answered 
item are taken as intentional omissions, and will be considered 
in the next section. Concentrating on nonresponse due to not- 
reached only, and limiting our attention to examinees who have 
reached at least the first the item, we must address n patterns of 

missingness: for .? = 0 n-1, let m^ denote the string of n-.^ 

I's followed by >8 O's, That is, m^ is the missingness pattern of 
an examinee that has not reached the last >8 items. 

Checking i gnorability . We continue to assume that a common 
IRT model holds for the responses of items reached, u . , and 

ODS 

not-reached, This assumption is crucial for applying IRT 

models to data with not-reached items, and two ways of checking it 
will be discussed at the end of the section. When it does hold, 
the missingness process is characterized by the examinee speed 
parameter <f> = ^^o’***'^n 1^ ^ multinomial variable, where 4> ^ 

is the probability that missingness pattern m^ will be observed- - 




') 
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i.e., that the last i items will not be reached. Under this 
formulation, the probability of the complete observation (m,u) is 
obtained as 

n 

p(m,ul^,<^) = p(ml<^) n p(u l^) : 

j ^ 

MCAR (and therefore MAR) holds. The probability of v is 

p(v\e,4>) = p(ml^) X P(u^.j5) du^.^ 

= p(m|<^) 

If, in addition to MCAR, all values of B are possible at all 
values of <^--even if some are more likely than others- -the 
not-reached missingness process is ignorable with respect to 
direct likelihood inference. That is, direct likelihood inference 
about 9 in the presence of not-reached items can be based on 

A. 

L (^lu , ). Sampling-distribution inferences about 6 from 6 are 

* obs 

also appropriate. They pertain to repeated sampling of responses 
to the items that were reached, and enjoy the asymptotic sampling 
properties of MLEs if the number of items reached is large. 

For ignorability to hold under Bayesian inference, it is 
necessary in addition to MAR that p(^,<^) * p(^) p(<^)l that is, 
that “speed" and "ability" are independent. Empirical evidence 
suggests that this is not generally true. Van den Wollenberg 
(1979), for example, reports significant positive correlations 
between percent-correct scores on the first eleven items (which 

O u 
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were reached by all examinees) and the total number of items 
reached, in four of six intelligence tests in the ISI battery 
(Snijders, Souren, and Welten, 1963). Bayesian inference about 
6 would take this relationship into account by using the correct 
posterior distribution 



Checking the IRT model . Verifying MCAR for not-reached items 
required assuming that the responses that would have been 
observed, had those items been reached , follow the same IRT model 
as those that were reached. We now describe two ways of checking 
this assumption, one using only the response data v that are 
normally observed, the other requiring the researcher to discover 
not-reached responses in a supplemental data- gathering effort. 



p(^|v) a j L(^.i^lv) ^{6 ,4>) d4> 



= ; L(^|m)p(^|(?) d(? p((?) 



L*((?lUobs) P(<^) J L(^|m) p(^l<?) d(? 




) p(^|m) . 
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A necessary condition for an IRT model to hold in the 
presence of not-reached items, is that the same IRT model hold 
for reached items among examinees who have reached different 
numbers of items. Let he the observed responses to items 

that are reached in a sample of N examinees, and let nobs^ be the 
number of items examinee i reaches. Let be the parameter (s) of 
item j . The marginal probability of under the hypothesis 

that item parameters are invariant over groups of examinees with 
different missingness patterns is 

nobs . 

P^CUobg) = n ; n ^ p(u l«,/3 ) p(^|ra.) de . (13) 

i j-1 J J 



Viewing (13) as a likelihood function and maximizing with respect 

to , . . . yields the value L^ . 

An alternative hypothesis is that item parameters vary over 

not-reached groups. We can estimate n-j+1 different item 

parameters for item j, where p.. applies to those examinees who 

J ^ 

have reached n-i items. For example, Item 3 will have parameters 
for groups who reached n, n-1, ...» 4, and 3 items. The marginal 
probability of under this hypothesis is 



nobs . 



PB<“obs> - " ^ • 

1 1"=1 J J J, 
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which leads to the maximizing likelihood value L , In large 

iS 



samples, -2 log(L /L ) is approximately chi-square with n(n - l)/2 
degrees of freedom when the null hypothesis is true. 

Van den Wollenberg (1979) provides empirical evidence that 
the “item parameter invariance" with respect to not -reached groups 
is often, but not always, tenable. Applying his own 
goodness-of-f it indices rather than the likelihood ratio 
suggested above, he verified this type of invariance in five of 
the six ISI tests. 

A second way of studying the IRT assumption in the presence 
of not- reached items begins by finding out what the responses to 
the not-reached items would have been. This can be accomplished 
with paper-and-pencil tests by allowing a sample of examinees to 
continue beyond the usual time limit until they have answered 
every item, but using a different colored pencil after the usual 
limit. Of the total of n items, then, examinee i will have 
responded to the first nobs^ under the normal time limits and the 
remaining nmis^ = n - nobs^ thereafter. Under the null 
hypothesis of an invariant IRT model across reached and not- 
reached items, the marginal probability of the completed response 
matrix u = (u . ,u , ) under the null hypothesis is 



Pc(%bs^ = n J n p(u.j|e,/?j) p(^1»k) . 
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Alternatively, we can fit an IRT model that allows both the item 

parameters and examinee parameters to differ before and after the 

time limit. Each item except the first can have two parameters, 

B. . and B. , , whenever some examinees answered before the 

j ,mis j , obs 

time limit and some answered it after; each examinee can two 

abilities, 9 . and 9 . . The resulting marginal probability is 

mis obs ° 



nobs . 



)^(u . ) = n f f [ n p(u..U , , 

D obs . , ^11* obs '^ 1 , obs. 

1 j=i -J -J 1 



)] 



nmis . 
1 



n p(u. .\9 . J. . p(9 . ,9 . ) d9 . 

^ 11 * mis i,mis. ‘ obs mis obs 

J J ‘J 1 



d^ 



mis 



In large samples, -2 log(L^/Lj^) is approximately chi-square under 

the null hypothesis, with degrees of freedom equal to the number 

of items with two parameters appearing in (14), plus the number 

of additional parameters estimated for the examinee parameter 

distribution p(^ . ,9 . ) over those required for p(^). 

^ obs mis ^ 




Case 5: Omitted Responses 

A missing response is an intentional omission when the 
examinee is administered the item, has time to appraise its 
content, and decides for his own reasons not to make a response. 
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After showing that such omissions can't generally be considered 
ignorable, we discuss a number of ways to deal with them. 
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Omitting behavior . Test scores T(v) are assigned to 
patterns of rights, wrongs, and omits for the purposes of 
comparing or selecting examinees. Assuming that a correct 
response to an item gives a higher value of T than an incorrect 
response and that an examinee wants to obtain a high score, he 
will make responses he believes are correct. How he will respond 
to an item about which he is unsure depends at least partly on 
how the test will be scored (Sabers and Feldt, 1968). 

Formula scores, for example, take the form 



where R(v) and W(v) are counts of right and wrong responses and X 
is a constant selected by the test administrator. Setting X = 0 
gives number-right scores; X = 1 gives right-minus-wrong scores; 
for multiple choice items with A alternatives, X = 1/(A-1) gives 
the familiar "corrected- for- guessing" scores . The examinee 
maximizes his expected score by answering items for v/hich he 
thinks his chances of being correct are at least X/(l + X) . In 
particular, he should answer every item under number- right 
scoring, and those for which he thinks his chances are at least c 
« 1/A under corrected-for-guessing scoring. Some examinees either 
do not use this strategy, or make inaccurate assessments of their 



T(v) = R(v) - X W(v) 
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chances. Analyzing responses to items that examinees originally 
omitted under right-minus-wrong scoring, Sherriffs and Boomer 
(1954) did find about half, or X/(l - X), the omitted responses 
would have been correct among examinees who scored low on a 
risk-aversion scale, but nearly two- thirds would have been correct 
among examinees with high risk-aversion scores. 

The examinee's perceived probabilities of correct response 
must be distinguished from the probabilities of the IRT model. 

IRT gives the proportion of correct response to Item j from 
examinees with ability 9 , but each of these examinees may have a 

different idea of his own chances. They may differ in the 

% 

accuracy of their estimates and their confidence about them, and 
their perceived probabilities need not average to the IRT 
probability. Observing whether an examinee omits an item merely 
tells us something about what he thinks u^ would be. 

Are omits ignorable? To see that the assumptions needed for 
ignorability are not generally plausible, we examinee the case in 
which n = 1; i.e., a single item. MAR simplifies to g^(m|U=0) = 
g.(m|U=l), meaning that an omit is just as likely if the response 

<t> 

would have been correct as if it would have been incorrect. But 
since examinees tend to answer items they feel are correct, MAR 
implies the unappealing assumption that their perceptions of 
correctness are independent of actual correctness. 

MAR (along with D) is merely sufficient for ignorability, 
however, and ignorability can hold when MAR does not. For a 
single item, the necessary condition for ignorability under 
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likelihood inference given in (9) requires that for each value of 
<t>^ the expression 

g^(M=0|u=0) f^(U=0) + g^(M=0|u=l) f^(U=l) (15) 

take the same value for all 9. This is the just the probability 

that the item will be omitted. That its value remains constant as 

6 increases without limit flouts intuition, since we'd expect 

examinees whose high abilities virtually assure a correct response 

to be aware of their high chances, and respond rather than omit. 

This conjecture is borne out in empirical studies such as 

Stocking, Eignor, and Cook (1988) that show markedly lower rates 

of omission from examinees with high (corrected- for -guessing) 

scores than from examinees with low scores. 

Since ignorability is not satisfied for direct likelihood 

inference, L does not generally yield the correct MLE, and 

sampling distribution inferences based on the resulting value are 

inappropriate. The requirements for ignorability under Bayesian 

inference are the same as those for direct likelihood inference 

except that they must apply when averaged over 4> rather than for 

each particular value; ignorability is thus implausible there too. 

Lord (1974) argues against ignoring omits under maximum 

likelihood scoring, saying that the examinee who knew we planned 

to use the MLE from L (^|u , ) as a score would omit all items 

* obs 

except those for which he was certain his response would be 

36 
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correct. This plausible argument also presumes a relationship 
between actual and examinee -perceived item correctness. 

Filling in the blanks . Lord (1974) suggested that omits on 
mult iple - choice items under guessing-corrected scoring can be 
handled with standard IRT estimation routines if they afe treated 
as as fractionally correct, with value c. He assumed "rational" 
omitting behavior: examinees omit items only if their chances of 
responding correctly would have been c, so that h^(Uj=l I m^^O) = c 
for all items and all 9 . Omitting decisions are also assumed to 
be independent from one item to the next, given 6 and 4 > . In a 
natural extension of conditional independence of item responses 
given 9 , we assume "extended local independence," or conditional 
independence of item responses and missingness given 9 and 4 >: 

P(U=u,M=ml^ ,^) = n P(U =u M =m . 

j J J J J 

Under these assumptions, the complete-data likelihood takes 
the following form: 

« - n - - 

= n p(u m 

j 

n - - - 

= n f (u )g(m Ju. ,9 ,<!>') 

1 J J J 

‘4: 1-* 
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= n p (5) J Q ( 5 ) J n g(m \u e,<f>) 

j j j j j j 

[where P^(5) = f/Uj=D and Q^(«) = 1 - P^(5)] 

= L^C^'iu) L^|^(«,.^j(mju)) . 



The complete-data likelihood thus factors into two terms, with 

being the IRT-based probability of item responses and the 

probability of the missingness pattern given the response 

pattern. Both depend on 9. Were u and m both fully observed, 

the usual MLE based on would be a conditional MLE, foregoing 

the additional information conveyed by L i but avoiding the 

m|u 

nuisance parameter <f> . One would proceed by finding the maximizing 
value of the log likelihood 



n - 

E Uj log ?.{9) -f (1 - u^) log q.(9) . 
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Tile same conditional -estimation strategy can be applied to 

the observed data v =- (u , ,m) , by maximizing the conditional 

obs 

expectation of L , or E[L (^|u , ,u , )|(u , ,m)]. Finding the 

^ u ^ u * mis obs * obs ^ ^ 

maximizing 9 for Dempster, Laird, and Rubin's (1977) EM 
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algorithm requires finding the maximizing 6 for the expectation of 
Z , given ^ provisional estimate d ; that is, or 



F(e|e*^) = f i (^|u . ,u , ) p(u . |u , = 5 ) du 

'I ' J u ' mis obs mis' obs i 



mis 



= 2 u. log P.(e) + (1 - u ) log Q.(e) 

obs J ^ J J 



+ J[ 2 u. log P.((?) + (l-Uj)Qj((?)] "^""mis' 

(16) 



mis 



But under Lord's assumptions, 



p(u . |u , ,ni, ^ = 
mis* obs 



?°) = n h^(u Jm = 0) 

mis ^ 



u. 1 -u 

= n c J (1 - c) J 



mis 



a Y^bue that doesn' t depend on ^ or B . Substituting this 
expression into the second term of (16), the integral simplifies 
to 
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Uj log Pj(^) + (1- Uj) log Qj(^) 



= S 
obs 



+ E c log P (^) + (1 - c) log Q (^) . 
mis ^ 



(17) 



This is equivalent to the log of Lord's Equation 4, the 
psuedo-likelihood obtained by using u^ = c in the complete-data 
likelihood whenever m^ == 0. Equation (17) does not depend on , 
so the EM algorithm comprises only a single cycle. Maxima of (17) 
are maxima of E(L^) . A global maximum is assured if the 
complete -data probability belongs to the exponential family, as is 
the case with the Rasch model. 

Lord (1974) points out that the criterion function obtained 
by replacing omits with fractionally-correct responses is not the 
likelihood function induced by the observation. We have shown, 
however, that the resulting estimate of 0 maximizes what might be 
called a "marginal conditional" likelihood function, allowing one 
to apply standard results from the theory of maximum likelihood 
estimation, such as consistency- - in this context, as the number of 
items not omitted increases. 

The foregoing analysis yields insight into other treatments 
of omits that impute values for • Supplying random responses 

that are correct with probability c provides a crude numerical 
evaluation of (16), leading to a maximizing value whose 
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expectation is the value obtained when the integration is carried 
out in closed form as in (17). This practice is justified by the 
same assumptions as Lord's (1974) approach, but sacrifices 
accuracy for convenience. Supplying incorrect responses for 
omits leads to a "marginal conditional" MLE for B under the 
assumption that responses to omitted items would surely have been 
incorrect. This may be reasonable for open-ended items, but it is 
not plausible for multiple -choice items for which even the least 
able examinees have nontrivial probabilities of success. In these 
cases, supplying incorrect responses for omits would bias 
estimates of B downward. 

Lord addressed "rational" omitting behavior, in that the 
expectation of correctness for an omitted response is always c, 
the value associated with the optimal omitting strategy. As we 
have noted however , studies of responses to items originally 
omitted show that not all examinees behave in this manner. The 
tendency to omit when probabilities of success may be higher than 
c can be associated with personality characteristics, demographic 
variables, and level of ability. This approach biases estimates 
of B downward for risk-aversive examinees. We now discuss how 
such dependencies can be taken into account, although it is by no 
means certain that this should be done; to do so effectively 
adjusts scores upward or downward in accordance with examinee 
background characteristics, which may be objectionable on the 
grounds of fairness. Assuming rational omitting behavior in 
scoring rules, and making the rules and optimal strategies as 

/ : - 

I 
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clear as possible to examinees, may be preferable when test 
scores are used to make sensitive placement or selection 



decisions . 

Modeling empirical rather than ideal omitting behavior 

requires a study like that of Sherriffs and Boomer's (1954), 

where examinees are first administered a test under standard 

conditions, then later asked what their responses to the items 

they omitted would have been. From these data it is possible to 

calculate proportions of omits that would have been correct as a 

function of the items and examinee character istics- -possibly 

including 9 . If ^ is not included, empirical estimates 

h(Uj=l |M j= 0 , y) are employed in place of c in (16). This takes 

into account possible differences in rates of omitted correct 

response from one item to the next- -some higher than c, some 

lower- -or among examinees with different demographic or measured 

psychological characteristics. If 9 is included, then estimates 

of 9 employing h (U.=l|M.=0,y) must be calculated iteratively. 

^ J J 



The values h (U.=l|M.=0,y) 

" J J 



^0 replace c in (17) for each 
,1 



missing response, and an improved estimate 9 is obtained via 
maximum likelihood. This must then be used to produce an improved 
estimate of the expectation of each missing response. 



h^(U^=l|M^=0,y) 



1. From these and u , , yet another estimate 

) obs 



will be obtained. The process is repeated until convergence 



occurs. The original estimation of item parameters and of the 

functions h (U.—l |M.— 0 ,y) recjuires similar modifications to 

^ J J 

standard item parameter estimation algorithms. Additional 



O 

ERIC 



42 



4S 



parameters for h.(U.*l|M.«0,y) can be estimated jointly with 
^2 J 

standard item parameters; a plausible implementation would make 
the logits of h's associated with each item linear in 0. 

Lord's (1983) model for omits . Wlvile Lord's (1974) 
treatment of omits as fractionally correct yields reasonable and 
statistically defensible (conditional) MLEs of $ when rational 
omitting behavior is assumed, the full likelihood induced by the 
data was neither presented nor exploited. To accomplish this 
requires an explication of the missingness process, in the form of 
a model for the joint probability distribution of U and M. Such a 
model was proposed by Lord (1983). 

Lord's (1983) model for omits maintains the context of 
guessing- corrected scoring of multiple -choice items with A(= 1/c) 
alternatives, but offers additional structure for the response 
process. It is first assumed that an examinee either feels a 
preference for one of the alternatives or is totally undecided 
among them. The proportion of examinees with ability 0 feeling 
no preference on Item j is R^(^). If a preference is felt, a 
response is made; of the responses made by examinees with ability 
e who feel a preference, the proportion correct is ? ^(6) . If no 
preference is felt, the examinee will either omit the item with 
probability o) or respond completely at random. Responses and 
omitting decisions are again assumed to be independent from one 
item to the next, conditional on 6 and o). 
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These assumptions imply that the missingness parameter <f> is 
and lead to the following forms for and h^(Uj|mj): 



1-m. m. 

t^(m.) = w R.(e) J [1 - wR.(e)] J 
<f> J J J 



and 

if.m = 0 
J 

if m . = 1 , 
J 



u. 

, J 



1-u. 



h^(u.|m.) = 



(1 - c) 



tl. .. 1-U. 

XX T XX i 

P. (9) - P. («)] ^ 



'k'k 

where (9) = P(Uj=l | ^ ,mj=l) , the conditional probability that an 
observed response will be correct, is the sum of the probabilities 
of responding correctly when a preference is felt and guessing 
correctly when a preference is not felt: 



P^ (9) = [1 - Rj((?)] Pj((?) + c(l - w)Rj((?) . 



The joint likelihood for 9 and w induced by v (i.e. , and m) 

is thus 
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- n 

L((?.o>|v) = t^(m) J n h^(Uj|m.) du^.^ 



t (m) n h^(u iM - 1) J n h^(u |M - 0)du^.^ 

obs mis 



n 1-m. 

n [wR ((?)] -^[1 

j ^ 






m. 

1 J 



**'^1 . .. .** . .. ^ 



i 'X'X 

n Pj ( 6 ) 



obs 



(18) 



Assuming the functions P and R are known, (18) provides a basis 
for full - information inference about d. Under maximum 
likelihood, the joint maximum for {9 ,o>) may be found by standard 
numerical methods, and a large-sample variance estimate can be 
based on the inverse of the second derivative of the log of (18) 
with respect to 6 and w. Under Bayesian inference, the posterior 
for 9 and w is obtained by multiplying (18) by p(S,w) and 
normalizing; from this point, one may examine characteristics of 
the joint posterior for 9 and w, or integrate w out to obtain the 
marginal posterior for 9. 

Lord suggests that this model might be implemented by 

“A* 

specifying functional forms for P and R, e.g., the 3-parameter 

^ • • 

logistic IRT function for P and the 2-parameter logistic with a 

negative slope for R. The underlying model for the correctness of 
item responses, observed or not, can be written as a function of 

At 

P and R as follows: 
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P(Uj=l|^,a,) 



= P(U^=lj^ ,w,Mj=l) P(Mj=l|^,w) + P(Uj=l|^ ,w,Mj=0) P(Mj=0|^,w) 

= {c(l - w)Rj(^) + [1 - R^.(^)]P*(^)} [1 - wR^(^)] + cwR^(^) . 

Note that this probability depends on u) as well as 6] thus, the 
underlying model for item responses is not a standard IRT model 
depending on 6 alone and exhibiting local independence . This 
would be true only if for each value of 6, all examinees with that 
9 had the same value of u). A special case of this requirement is 
for all examinees at all values of 9 to have the same value of cj. 
Lord points out that if this were so with »= 0- - i . e . , no 
propensity toward omitting, even when no preference is felt- -the 
resulting IRT model would be 

P.(^) - Pt(^) [1 “ R,(^)] + c R.(^) . 

J J J J 

In a manner described by Samej ima (1979), a response curve of 
this form need not be monotonically increasing over the range of 
9. High-ability examinees would tend to feel preferences and 
respond correctly; moderate-ability examinees might tend to feel a 
preference for a clever distractor and answer incorrectly at a 
rate lower than c; very low ability examinees would feel no 
preference at all, and answer correctly at a rate equal to c. 
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Nominal category models. IRT models for multiple -category 



items have been proposed by Bock (1972), Samejima (1979), Sympson 
(1983), Thissen and Steinberg (1986), and others. These models 
have sometimes been used for data with intentional omissions, 
with an omit treated as one more possible response to a multiple- 
choice item. Lord (1983) expresses reservations about this 
practice , 

"...since it treats probability of omitting as dependent 
only on the examinee's ability, whereas it actually 
depends on a dimension of temperament. It seems likely 
that local unidimensional independence may not hold." 

(p. 477) 

The following analysis makes Lord's concerns more explicit. 

The features of the approach regarding omission are retained 
when all overt incorrect responses are collapsed into a single 
category. Recalling that the values 0, 1, and * of v stand for 
observed wrong, observed right, and omit, we obtain the multiple- 
category model probabilities as follows: 

f*(V. = 0) = P(U = 0,M = ll(?) 

^3 J J 

- f f-(U. - 0) g(M, - l|u, - P(^|(?) d<j> ; 

" J J J 

f*(V. = 1) - P(U, = 1,M - ll(?) 

J J J 

- J f^(Uj - 1) g(Mj - l|Uj - 1,^,'^) p(<^|^) d<i> ; 
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u 



and 



f!(V. = *) - P(U. - O.M, -0|S) + P(U. - l.M. -0|S) 

^ } J J J J 

= J f^(U^ = 1) g(M^ - Oju^ = p(<^|«) 

+ J f^(Uj = 0) g(Mj = 0[Uj = p(<^|«) d<j> . 



Under the assumption of "extended local independence," 



p(U = u,M = = n p(Uj = -Mj = mj|S,<^) . 



This implies 



p(V = v\e,(f>) = n p(V = V . (19) 

j ^ ^ 



Using (19), 



p(V = vJS) = J p(V = v\e ,<l>) p(<^l^) d<f> 



= J n p(V. = v.\d,<f>) p(<^|S) d<f> . (20) 

j . J J 

But for local independence to hold in the usual sense for the 
multiple-category model, it would be necessary that 




r I' 
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P(V - v|(?) - n X P(V - V p(^l(?) de , 

j J J 



and this does not generally follow from (20) as the order of 
multiplication and integration has been interchanged. It does 
follow if for each 6 value, 4> takes the same value for all 
examinees with that 6 value; that is, the variables of the 
omitting process may vary from one value of $ to another, but not 
among examinees with the same value of $, Lord's objection, then, 
may be stated as a desire to allow for different propensities for 
omitting to occur within a given level of ability. 

A second reservation that might be offered for this approach 
stems from the fact that probabilities for v given $ are averages 
over (p . Even if (i) f«(u) is an IRT model satisfying local 
independence and (ii) f ^ (u) g,(m|u) satisfies extended local 

7 <p 

'k 

independence, the multiple -category response curves f«(v) will 

u 

vary from one group of examinees to the next unless the 
conditional distributions p(<^J^) are invariant over groups. 



How to model omits, if you must . S tandard IRT conce rns 
examinees' tendencies to make correct responses when omits cannot 
occur. When they can occur, the differences among examinees' 
tendencies to omit responses can be cast as a nuisance variable in 
the classic sense. It is often easier to deal with such 
extraneous influences at the time the data are collected than to 
model them after the fact. In aptitude and ability testing, we 
should inform examinees as clearly as possible how their 

5 a 
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performance will be evaluated, and persuade them as convincingly 
as possible to use the omitting strategy that maximizes their 
expected score. To the degree we succeed, variation in is 
reduced and examinees' data differ mainly because of differences 
in ^ . If too the proportion of omits is low, imputing 
fractionally-correct or even random responses at the level c 
yields inferences that are plausible, readily-calculable , and 
robust with respect to alternative models for omitting. 

For the sake of completeness, however, we now outline an 
approach using a full model for response and omission. The model 
exhibits local independence for elements of U given $ , and 
extended local independence for elements of (U,M) given {S . 

Its implementation requires either that g is assumed to be known 
or that an experiment with the same items and similar examinees 
has revealed the values of item responses that were originally 
omitted. We assume here that the experiment has been carried out, 
and a complete data matrix (u,m) is available for a sample of 
examinees from a population of interest. 

An IRT model f.(U = u,^,^) is assumed for item responses. 

The missingness process is modeled in terms of functions 

e(M. « m.lu. = u.,^,<^,w.) for each item, where n, are now 
J J * J J J J 

additional item parameters for the frequency with which the item 
is omitted. For example, we could estimate from the completed 
data set i tern -omitting parameters rj ^ - (djQ,ejQ,d^^ ,ej^) that give 
the loeit regression of m. on 6 when u. * 0 and when u. *= 1; that 
is , 
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logit P(Mj - 0|Uj - 0,ff,rj^) - djQ ff + 

logit P(Mj - 0|Uj - 1.5. r/j) - 5 + , 

where logit P = log(P/(l - P)). The examinee omitting parameter 4 > 
could then be a tendency to omit more or less than average, so 
that 



logit P(M^ = 0 |Uj = 0,e,4>,vp = d^Q 5 + + <f> 

logit P(M = OjU = l,e,4>,r, ) = d 5 + e + . 

J J J J J 

The complete - data likelihood function for the item parameters is 

L(/3,r?|u,m) - 

N - - - 

n JJ f^(u^|S,/3) h(m^|u^,5 ,4>) d6 d<t> . (21) 



Equation (21) provides a basis for estimating from the 

experimental data, either directly via maximum likelihood or, 
after multiplication by a prior distribution, via Bayesian 

A A 

methods. Maximum likelihood yields point estimates 

Bayesian methods yield the posterior distribution p(^,r;|u,m). 

Estimating the examinee parameter distribution p(?,i^) at the same 



time might also be desirable, say by positing a functional form 
and estimating its parameters. 

The results of this calibration can then be used to estimate 
the B value of new examinee i, from whom only is observed. 
Under Bayesian inference, the relevant posterior distribution is 

p(^|vj^,u,m) a 

XXX p(/3,rjlu,m) pi8 ,<f>) d/3 . 



Under maximum likelihood inference, the maximizing value (6 ,4>) of 

~ A A 

L(^ ,^| v^ might be sought. 



Inferences about Item Parameters 
When not all item responses are observed, the (marginal) 
likelihood function for item parameters P induced by the data 
matrix v = (v^ v^) from a sample of N examinees is 



L(/3|v) 



n XJJ P(^„is.i ■%bs,i-"'il'’-‘^’^^ ^^mis.i 



N 

“ n XX P(vJ(?,<^.^) P((?,<^) d4> . (22) 
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where u 



, . and u . . are the observed and missing portions of 

obs,i rais,i 

the response vector of examinee i. The psuedo- likelihood obtained 
by ignoring the raissingness process is 



. N 

i ’ 

Ignoring the raissingness process when making direct 
likelihood inferences about P means comparing the values of (23) 
rather than (22) at different values of Equation (22) differs 
from (23) by integrating over with respect to 6 for 

each examinee, rather than over p(v^|^,<^,^) with respect to 6 and 
(f>. The resulting integrals are proportional with respect to if 
and only if for all values of the conditions for ignorability 
for 9 given p are satisfied for Bayesian inference. Therefore, 



o Ignorability under direct (marginal) likelihood inference 
about p holds if (i) ignorability under Bayesian inference 
holds for each 6 conditional on and (ii) 9, and <f> are 
distinct in the sense required for direct likelihood 
inference. 



If ^ is a priori independent of all 9 and 4>, the correct 
Bayesian posterior for ^ in the presence of missing data is the 
product of (22) and the prior p(^) . Ignoring the missingness 
process when making Bayesian inferences about ^ means using 
instead the product of (23) and p(^) . The preceding result 
indicates when ignoring the missingness gives the correct 
likelihood. To obtain the correct posterior, then, we have: 
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Ignorability holds under Bayesian inference about p if (i) 
ignorability holds under Bayesian inference for each $ 
conditional on p, (ii) 0, and (p are distinct in the sense 
required for Bayesian inference. 

If ignorability holds under direct likelihood inference, L 

A 

yields the correct value for p. The necessary condition for 

A 

sampling distribution inferences based on p requires that the 
probability of the missingness pattern--in this context, the 
distribution of counts of individual missingness patterns- -not 
depend on the values of observed responses. This condition is 
implied by iMCAR . When it and direct- likelihood ignorability hold, 
conditional sampling distribution inferences are appropriate. 

They pertain to sampling of item responses to the observed items 
from repeated subsamples of examinees with each observed 
missingness pattern, with subsample sizes fixed at the observed 
counts of those missingness patterns. 

Case 1 . We have seen that for alternate test forms, 
ignorability holds for Bayesian inferences about 6 given p. 

Random assignment of test forms ensures MCAR. Therefore the 
responses to items on forms that are not administered can be 
ignored under direct likelihood inference about p, and under 
Bayesian inferences as well as long as prior distributions are 
independent. Sampling-distribution inferences can based on the 
iMLE p with the understanding that they pertain to repeated 
sampling of examinees for each form in the sample sizes that were 
actually observed. 

54 
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Case 2 . In targeted testing, items believed to be easier 
are administered to examinees that are expected to have lower 
abilities, and items believed to be harder are administered to 
examinees expected to have higher abilities. Bayesian inferences 
about B given ^ can ignore missingness only after conditioning on 
y, the collateral examinee variable used in test- form assignment. 
Correct inferences about p under direct likelihood inference thus 
require that p(^) in (23) be replaced by p(^|y^), as well as that 
all values of P are possible (if not always likely) on all test 
forms. Bayesian inferences must additionally take into account 
the prior beliefs that led to the differential assignment of items 
to forms. Let z = (z^,...,z^) represent the collateral 
information about items used to make these assignments (e.g., 
pretest item difficulties or item content). Appropriate Bayesian 
inferences about P that account for the missingness process may be 
drawn from 



As in Case 1, the distribution of counts of missingness patterns 

does not depend on the values of observed item responses, so 

conditional sampling-distribution inferences about P from 
'k 

L (^|u , ,y) are appropriate. They pertain to repeated 

* obs 

administrations of the observed counts of administered forms at 
each value of y, to samples of examinees with those y values. 
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Case 3, Conclusions similar to those of Case 2 hold for 



direct likelihood, and Bayesian inference in adaptive testing. 
Ignorability holds for Bayesian inference about $ given P (again 
conditioning on y if collateral examinee variables are used in 

'k 

item selection) , so direct likelihood inferences about P from L 
are justified (Verhelst and Veldhuijzen, 1987 )-- though not always 
satisfactory. The reason is that p(P\z) is often very strong in 
practice; indeed, P is usually treated as known for the purpose of 
item selection. In this case, z. could be the mean p. and 
covariance matrix of an assumed normal distribution for » and 
item selection would be based on . The data collected for a 
given item when it is administered adaptively tend to be from 
examinees in a relatively narrow band of ability. For binary 
items with more than one parameter, the number of examinees 
required for stable estimates may well exceed the number that can 
be tested in practice. Bayesian inference is preferable under 
these circumstances. Provisional item parameter estimates based 
on p(^\z) may be used to administer items, then 
adaptively-acquired item responses can be used to produce an 
updated distribution 



P(^U) . 



obs ' 



Because the missingness process is ignorable under direct 

^ -A 

likelihood inference, the usual KLE p obtained by maximizing L 




56 



gives the correct point estimate for MLE-based sampling- 

distribution inferences about As in the section on estimating 

$ from adaptive tests, however, the necessary condition for 

ignorability under sampling-distribution inference is not 

satisfied- - the probabilities of missingness patterns depends on 

★ 

the values of observed responses. MLE properties based on L need 
not apply to item parameter estimates obtained from adaptive test 
data. It may be that for some adaptive test designs, the usual 
variance estimates with m fixed at m are good approximations to 
the variances that would be obtained under repeated sampling of 
the entire adaptive test for N examinees, but this must be 
determined case by case. 

Case 4 . Recall that when some items at the end of a test are 
not reached, MAR holds for inferences about 6 given but Bayesian 
ignorability does not hold unless speed and ability are 
independent. Missingness due to time limitations, therefore, is 
not generally ignorable under any type of inference about 
Assuming there are no restrictions on the parameter space, drawing 
likelihood inferences about requires one to replace p(S) in (22) 
with p(^jm^), where 

p(^|nij^) a p(^) J p(M - nij^|<^) p(<^|^) d(f> . 

If, in addition, the test has been assembled to start with easy 
items and become harder, the prior information about items [say 
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examinees will not reach particular items. This information must 
also be used in Bayesian inference about The appropriate 
posterior is 



Case 5 . The topic of inference about item parameters when 
examinees omit some items Intentionally has already been broached 
in the discussion about estimating 6. Bayesian ignorability for 
d given p does not generally hold, so missingness mechanisms must 
be specified and inferences about p must start with on the full 
likelihood (23). A number of approaches were discussed there, 
including imputing responses for omits, using a multiple -category 
IRT model, and fitting Lord's (1983) model for responses and 
omits. The approach that is most easily incorporated into 
standard IRT algorithms is to treat intentional omits as 
fractionally correct (Lord, 1974). Assuming that examinees omit 
only in accordance with the strategy that maximizes their expected 
score , this approach gives "marginal -conditional" MLEs that 
maximize 



p(/3jv,z) cx p(/3jz) n J ^1/3) p(^lnij^) dd . 



obs,i 



E[L (^|u 
I u • 1 



L . ,U , 

mis obs 
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n X n 



obs . 



pj(») 



u. . 
IJ 



Qj(«) 



1-U. . 

IJ 



n p(^) dB . 

1 ^ J J 

(24) 



mis 



Equation (24) is "conditional" in that it accounts for the 
influence of 6 upon item responses given the pattern of 
missingness, but does not capture the role of 9 in determining 
that pattern. It is "marginal" in that it is the expectation over 

^mis %bs ™ conditional likelihood L^(/3|u) that 

would be maximized if all responses had been observed. 



Summary 

In practical applications of item response theory (IRT), 
there are several reasons that item responses may not be observed 
from all examinees to all test items. Ignoring the missingness 
process under dire ct likelihood inference means using a psuedo- 
likelihood that includes terms for only the responses that were 
observed, without regard for the processes by which they came to 
be observed. The resulting inferences are appropriate if the 
psuedo- likelihood is proportional to the correct likelihood that 
does account for the missingness process. In this case the 
correct point estimate of an MLE is obtained. Sampling- 
distribution inferences from the MLE are appropriate only if the 
missingness pattern does not depend on the values of the observed 
data. When this condition holds, sampling-distribution inferences 
can be drawn with regard to repeated samples of responses under 
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the observed pattern of miss ingness . 



The inissingness process is 



ignorable with respect to Ba yesian inferen ce if the correct 
Bayesian posterior is proportional to the product of the 
psuedo-likelihood and an appropriate prior distribution. 

For fives conunon types of inissingness in IRT, we used Rubin 
(1976) theorems to determine whether ignorability holds under 
direct likelihood and Bayesian inference about examinee parameters 
e when item parameters p are known. In those cases in which the 
correct value of the MLE is obtained under direct likelihood 
inference, we asked whether sampling distribution inferences based 
on the MLE were appropriate. We then considered the analogous 
questions for inferences about P when the examinee parameters are 
eliminated by marginalization. Our findings are summarized below. 
Tables 1 and 2 highlight the results on ignorability. 

T Alternate test forms . When an examinee is assigned one of 
several alternative test forms by a random process such as a com 
flip or a spiralling scheme, the process that renders missing the 
responses to items on the forms not presented is ignorable for all 
three types of inference, both for estimating P and for estimating 

0 vjhen /3 is known. 

testing . When collateral variables such as 
educational or de„ 06 raphtc status are used to assign an era.lnee 
one of several tests that differ In their measurement properties, 
the resulting mlsslngness on forms not given is Ignorable under 
direct likelihood inference for « given f>. but not under Bayesian 
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inference unless the prior information about examinees that led to 
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differential assignments is conditioned on. This information must 
be taken into account for both likelihood and Bayesian inferences 
about for Bayesian inference, prior information about ^ used to 
select items must additionally be taken into account. Sampling 
distribution inferences may be based on MLEs for ^ and for 0 given 
P, conditional on the observed patterns of test form 
administration within values of the examinee variables used for 
targeting. 

Case 3: Adaptive testing . The same conclusions for direct 
likelihood and Bayesian inference follow in adaptive testing, 
where assignment proceeds item by item in accordance with the 
values of responses to preceding items. Ignorability under direct 
likelihood inference means that the correct points are identified 
as MLEs of $ given jS and of but the usual MLE properties under 
sampling-distribution inference need not hold because the 
probabilities of missingness patterns depend on the values of 
observed responses . 

Case 4: Not-reached items . When some examinees do not interact 
with the last items on a nearly nonspeeded test, the not-reached 
process is ignorable with respect to direct likelihood inference 
about 9 given and the MLE supports sampling distribution 
inferences that pertain to repeated administrations of the items 
that were actually reached. This missingness process is not 
ignorable under Bayesian inference unless speed and ability are 
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independent. And only then can direct (marginal) likelihood 
inferences about p ignore the missingness. Bayesian inferences 
about p further require that prior knowledge about item parameters 
be employed if it played a role in determining which items would 
not be reached, as when items are ordered from easy to hard. 

Case 5: Omitted items . When examinees are presented items, have a 
chance to appraise their content, and decide for their own reasons 
not to respond, the missingness is not ignorable. Inferences must 
be drawn from a full model for the joint distribution of 
missingness and item response. 

Not surprisingly, modeling this nonignorable nonresponse is 
difficult. Neither of the two most ambitious approaches proposed 
to date, namely Lord's (1983) model for omits and the use of 
multiple-category IRT models, handles the issue of local 
independence in a fully satisfactory manner. Under the 
assumption that examinees are perfect judges of their chances of 
responding correctly, and omit only if it is in accordance with 
the strategy that maximizes their expected score, Lord's (1974) 
treatment of omits as fractionally correct can be justified as 
providing the expectation of a conditional term in the full 
likelihood. This procedure is readily incorporated into standard 
complete-data IRT algorithms and avoids having to specify the 
full likelihood, but sacrifices information about examinee and 
item parameters conveyed by the observed pattern of missingness. 
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Table 1 



Ignorability Results for Estimating 9 Given ^ 



Type of 
Missingness 



Type of Inference 



Direct Likelihood 



Bayesian 



Sampling Distribution 



Alternate 

Forms 



Yes 



Yes 



Yes 



Targeted 

Forms 



Yes 



Yes , given 
examinee variables 



Yes 



Adaptive 

Testing 



Not-Reached 



Yes 



Yes 



Yes , given No 

examinee variables 
if they are used 

No, unless speed and Yes 

ability are independent 



Intentional 

Omissions 



No 



No 



No 



Conditional on the observed pattern of missingness. 
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Table 2 



Ignorability Results for Estimating p After Marginalizing over 0 



Type of Inference 

Type of 

Missingness Direct Likelihood Bayesian Sampling Distribution 



Alternate 

Forms 



Yes 



Yes 



Yes 



Targeted 

Forms 



Yes, given 
examinee variables 



Yes, given 
examinee and item 
variables 



Yes , given 
examinee variables 



Adaptive 

Testing 



Yes, given 
examinee variables 
if they are used 



Yes , given 
item variables and 
examinee variables 
if they are used 



No 



Not “Reached 



No, unless speed 
and ability are 
independent 



No , unless speed 
and ability are 
independent 



No, unless speed 
and ability are 
independent 



Intentional 

Omissions 



No 



No 



No 



Conditional on the observed pattern of missingness. 
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